Logging With Context

Logging is probably the most familiar form of telemetry. We probably started logging in the first program we ever authored when we printed Hello World! to STDOUT. Logging is the most natural first step in providing some data about the internal state of an application to an observer. Think about how many times we have added a print statement to our application to determine the value of a variable. We were logging. Printing simple log statements such as Hello World! can be helpful for beginners, but it does not provide the critical data we require to operate complex systems.

svg viewer

Logs can be powerful sources of telemetry data when they are enriched with data to provide context for the events they are describing. For example, if our log statements include a correlation ID in the log entry, we can use that data to associate the log entry with other observability data.

Application or system logs often consist of timestamped text records. These records come in a variety of structures, ranging from completely unstructured text to highly structured schemas with attached metadata. Logs are output in a variety of ways – single files, rotated files, or even to STDOUT. We need to be able to gather logs from multiple sources, transform and extract log data in a consumable format, and then export that transformed data for consumption/indexing.

In this lesson, we will discuss how to improve our logging, moving from plain text to structured log formats, and how to consume and export various log formats using OpenTelemetry. We will learn using Go, but the concepts presented are applicable to any language.

Our first log statement#

Let's start by using the standard Go log and write Hello World!:

Standard log for Hello World

Observe the plain text structure of the output and think about what we would need to do to parse the text to extract a structured output. It would be a relatively simple regular expression to parse, but with the addition of new data, the parse structure would change, breaking the parser. Additionally, there is very little context regarding the event or the context in which this event occurred.

The Go standard library logger has several other functions available, but we will not dive deeply into them here. If you are interested in learning more, we suggest you read here in the official Go documentation. For the rest of this section, we will focus on structured and leveled loggers as well as the API described here on GitHub.

Structured and leveled logs with Zap#

Structured loggers have several benefits over text loggers. Structured logs have a defined schema of keys and values that can be more easily parsed than plain text. We can take advantage of the keys and values to embed rich information such as a correlation ID or other useful contextual information. Additionally, we can filter out keys that might not be applicable given the log context.

V-levels are an easy way to control the amount of information in a log. For example, an application may output extremely verbose debug logs at the -1 log level but only critical errors at a log level of 4.

There has been a movement in the Go community to standardize the structured and leveled log interface via logr. There are many libraries that implement the API described in the logr project. For our purposes, we'll focus on a single structured logging library, Zap, which also has a logr API implementation.

Let's take a look at the key functions in the Zap logger interface:

Key functions in Zap logger interface

The preceding interface provides an easy-to-use, strongly typed set of logging primitives. Let's see an example of structured logging with Zap:

/
main.go
Structured logging with Zap

The JSON structured output of the logger provides helpful, easy-to-parse, and contextual information through strongly typed keys and values. In the tracing section of this chapter, we will use these additional keys and values to embed correlation IDs to link our distributed traces with our logs.

We will not dive deeply into where to output logs (such as a filesystem, STDOUT, and STDERR) but instead assume that the application logs we wish to ingest will have a file representation.

Now that we are producing structured logs in our application, we can shift gears to ingesting, transforming, and exporting logs using OpenTelemetry.

Ingesting, transforming, and exporting logs using OpenTelemetry#

In this example of using OpenTelemetry for ingesting, transforming, and exporting logs, we will use docker-compose to set up an environment that will simulate a Kubernetes host, with logs stored under /var/logs/pods/*/*/*.log. The OTel Collector will act as an agent running on the host. The logs will be ingested from the files in the log path, routed to appropriate operators in the filelog receiver, parsed per their particular format, have parsed attributes standardized, and then exported to STDOUT through the logging exporter.

For this demo, we will use the code located here on GitHub. Now let’s take a quick look at the layout of the demo directory:

Directory layout
Directory layout

The docker-compose.yml file contains the service definition where we will run the OTel Collector and mount the collector configuration and log files directory, varlogpods, to simulate the collector running on a Kubernetes host. Let's take a look at docker-compose.yml:

The docker-compose.yml file

To run this demo, move to the chapter source code, cd into the logging directory, and run docker-compose up.

OTel Collector configuration#

The OTel Collector configuration file contains the directives for how the agent is to ingest, process, and export the logs. Let's dive into the configuration and break it down:

The OTel Collector configuration file

The receivers section contains a single filelog receiver that specifies the directories to include and exclude. The filelog receiver will start from the beginning of each log file and include the file path for metadata extraction in the operators. Next, let's continue to the operators:

The operators which define steps for processing log files

The filelog operators define a series of steps for processing the log files. The initial step is a router operation that will determine, based on the body of the log file, which parser will handle the log body entry specified in the output of the operator. Each parser operator will extract the timestamp from each record according to the particular format of the log entry. Let's now continue to the parsers to see how the parser will extract information from each log entry once routed:

Parsers extracting information

For example, the parser-crio operator will perform a regular expression on each log entry, parsing a time variable from the entry and specifying the time format for the extracted string. Contrast parser-crio with the parser-docker operator, which uses a JSON structured log format that has a JSON key of time in each log entry. The parserdocker operator only provides the key for the JSON entry and the layout of the string.

No regex is needed with the structured log. Each of the parsers outputs to the extract_metadata_from_filepath, which extracts attributes from the file path using a regular expression. Following the parsing and extraction of file path information, the metadata operation executes adding attributes gathered from the parsing steps to enrich the context for future querying. Finally, the restructure operation moves the log key extracted from each parsed log entry to the Body attribute for the extracted structure.

Let's take a look at the CRI-O log format:

Now, let's look at the Docker log format:

The files are provided below:

/
logging
readme.md
docker-compose.yml
otel-collector-config.yml
varlogpods
The files used in this example

When the example is run with docker-compose up -d, we should see output like the following:

Expected output when running this example

As we can see from the preceding output, the OTel Collector has extracted the timestamp, body, and specified attributes from the metadata operator, building a normalized structure for the exported logging data, and exported the normalized structure to STDOUT.

We have accomplished our goal of ingesting, transforming, and extracting log telemetry, but we should also be asking ourselves how we can build a stronger correlation with this telemetry. As of now, the only correlations we have are time, pod, and container. We would have a difficult time determining the HTTP request or other specific information that led to this log entry. Note that Trace ID and Span ID are empty in the preceding output.

An Overview of OpenTelemetry

Instrumenting For Distributed Tracing